## [1] "Mon Mar 25 10:48:18 2019"

Studies

Read file of all studies in AACT.

## [1] "Total studies: 300214 ; unique NCT_IDs: 300214"

Interventional studies only

Select only Interventional study_type.

## [1] "Interventional studies: 237892 (79.2%)"
All interventional studies, by phase
phase N
Early Phase 1 2619
Phase 1 29795
Phase 1/Phase 2 10063
Phase 2 41637
Phase 2/Phase 3 4963
Phase 3 29662
Phase 4 25001
NA 94152

Drugs

Read file of all drugs in AACT. - id is AACT ID. - Note that one study may involve multiple drugs.

## [1] "Unique drug names: 91347 ; unique intervention IDs: 255077"

Studies: drug trials only

Select only studies involving drugs.

## [1] "Drug trials: 124421 ; unique NCT_IDs: 124421"

Merge study metadata with drugs.

All drugs, by study phase
phase N
Early Phase 1 2615
Phase 1 48593
Phase 1/Phase 2 13288
Phase 2 68850
Phase 2/Phase 3 6503
Phase 3 49507
Phase 4 36331
NA 29390

NextMove Leadmine NER

AACT drug names resolved to standard names and structures via SMILES.

## [1] "Drugs with resolved structure: 180555 / 197300 (91.5%)"
All drugs, by overall_status
overall_status N
Completed 114900
Recruiting 23262
Terminated 15384
Unknown status 15111
Active, not recruiting 10409
NA 5675
Not yet recruiting 5604
Withdrawn 5475
Enrolling by invitation 741
Suspended 739

Drugs by study year

## Warning: Ignoring 1 observations

Drug-trials by classification

All drugs, by phase
phase N
Early Phase 1 1916
Phase 1 36516
Phase 1/Phase 2 9476
Phase 2 50770
Phase 2/Phase 3 4830
Phase 3 38473
Phase 4 31452
NA 23867

Aggregate mentions by intervention ID.

## [1] "Mentions by intervention ID: 157862 / 171741 (91.9%)"

Aggregate mentions by trial.

## [1] "Mentions by study: 92966 / 99647 (93.3%)"

Aggregate mentions by drug.

## [1] "Mentions by drug name: 11108 / 58297 (19.1%)"

PUBCHEM:

Intervention IDs to CIDs from PubChem (via SMILES)

## [1] "PubChem SMILES2CID hits: 3960 / 4698 (84.3%)"
## [1] "Intervention IDs mapped to PubChem CIDs (via SMILES): 153876"

InChIKeys from PubChem (via CIDs)

## [1] "PubChem CIDs with InChIKeys: 3801"

CHEMBL:

ChEMBL molecule IDs, and properties (via InChIKeys)

## Warning: 152 parsing failures.
##  row            col           expected                                                                                                                                                                                actual                                  file
## 1028 biotherapeutic 1/0/T/F/TRUE/FALSE {'biocomponents': [], 'description': 'SUBSTANCE P', 'helm_notation': 'PEPTIDE1{R.P.K.P.Q.Q.F.F.G.L.M.[am]}$$$$', 'molecule_chembl_id': 'CHEMBL235363'}                                '../data/aact_drugs_inchi2chembl.tsv'
## 1028 helm_notation  1/0/T/F/TRUE/FALSE PEPTIDE1{R.P.K.P.Q.Q.F.F.G.L.M.[am]}$$$$                                                                                                                                              '../data/aact_drugs_inchi2chembl.tsv'
## 1367 biotherapeutic 1/0/T/F/TRUE/FALSE {'biocomponents': [], 'description': 'TERLIPRESSIN', 'helm_notation': 'PEPTIDE1{G.G.G.C.Y.F.Q.N.C.P.K.G.[am]}$PEPTIDE1,PEPTIDE1,9:R3-4:R3$$$', 'molecule_chembl_id': 'CHEMBL2135460'} '../data/aact_drugs_inchi2chembl.tsv'
## 1367 helm_notation  1/0/T/F/TRUE/FALSE PEPTIDE1{G.G.G.C.Y.F.Q.N.C.P.K.G.[am]}$PEPTIDE1,PEPTIDE1,9:R3-4:R3$$$                                                                                                                 '../data/aact_drugs_inchi2chembl.tsv'
## 1389 biotherapeutic 1/0/T/F/TRUE/FALSE {'biocomponents': [], 'description': None, 'helm_notation': 'PEPTIDE1{A.S.T.T.T.N.Y.T}$$$$', 'molecule_chembl_id': 'CHEMBL180971'}                                                    '../data/aact_drugs_inchi2chembl.tsv'
## .... .............. .................. ..................................................................................................................................................................................... .....................................
## See problems(...) for more details.
## [1] "ChEMBL compounds mapped via InChIKeys: 3332"

ChEMBL activities (via compounds)

## [1] "ChEMBL activities (with pChembl): 124438"

ChEMBL target IDs (via activities)

## [1] "ChEMBL target proteins: 3157"

IDG/TCRD:

## [1] "ChEMBL target proteins mapped to TCRD (human): 1806"

Targets by organism (top 10):

## [1] "Organisms: 187"
##  [1] "                Homo sapiens:   1806"
##  [2] "           Rattus norvegicus:    529"
##  [3] "                Mus musculus:    238"
##  [4] "                  Bos taurus:     98"
##  [5] "                  Sus scrofa:     36"
##  [6] "             Cavia porcellus:     26"
##  [7] "       Escherichia coli K-12:     19"
##  [8] "       Oryctolagus cuniculus:     18"
##  [9] "            Escherichia coli:     17"
## [10] "  Mycobacterium tuberculosis:     17"

Targets, TDL for human:

## [1] "    Tbio:    224" "   Tchem:    868" "   Tdark:      7"
## [4] "   Tclin:    707"